The following content has been provided by the University of Erlangen-Nürnberg.
OK, let's start. Welcome to lecture number three of machine learning for physicists.
I want to start by pointing you to the website. In case you haven't noticed, I'm uploading the lecture notes on the website.
So you can find them here in the section, lecture notes and files.
And I try to even upload them before the lecture, so you will find the lecture notes for this particular lecture already online.
Also on this website I have placed what I call a cheat sheet for Python.
So the idea is that if you have never learned Python or never done any programming language, this is something where you can get a quick start.
I also am distributing the cheat sheet among you. I don't know, there should be certainly copies left.
I printed 100 of them. So maybe they are going around somewhere, maybe you can pass it along because people are still entering and they might also want to have a copy.
And I encourage you just to use, say, the Jupyter notebook that you installed in your Python installation to execute some of these commands.
Just go through all the commands, type them in, see what happens, whether it makes sense to you, whether you can change the command and it does what you expect.
And so you will quickly learn Python and the rest you can then understand by going to a website such as python.org.
Also with respect to the homework, we don't do tutorials, but after this lecture I'm available if you have questions about your solutions for the homework that I gave you last time.
So this is how we can do it.
Okay, so now today we come to the heart of the matter and that is the so-called back propagation algorithm.
And just to encourage you, this is the landscape. We have discussed the structure of a simple neural net, how to feed in some input, how to get the output by repeated application of matrix multiplication and some nonlinear function.
We then have briefly discussed last time, and I will repeat it today, how to adapt the weights using an algorithm called gradient descent, more specifically stochastic gradient descent.
And today we will use to discuss back propagation.
Back propagation is how you actually find efficiently these derivatives that you need when you try to go down the hill, when you try to do stochastic gradient descent in order to minimize the cost function.
So when you want to train your neural network, back propagation is what you want to do.
And so you see in this drawing, once you've understood back propagation, you are up on the hill and after that it's just downhill and it's easy sailing.
So pay attention today.
Okay, so first I want to remind you of what we said last time.
This is a schematic picture of a neural network.
For me, a neural network is just a very complicated nonlinear function that takes the input, which may be a vector, to the output, which is another vector.
And this function depends on parameters.
They are here written as W, because W stands for weights.
So these parameters are the strengths of all these connections, plus there are some extra parameters that we call the biases.
I will lump all of them together.
So W, when I write W, may stand for a million weights and a few thousand biases.
So it's a really large number of parameters.
You probably haven't yet dealt with functions that have so many parameters.
Okay.
Then we have to specify what is our goal, what do we want to achieve.
And typically we have some desired function that we want to implement.
We may not know it explicitly, but we may know it in terms of many examples where we know the value of the function.
And we want to train our network, as we say, to yield the correct values on these examples and then hopefully also the correct values on other examples that it has not yet seen.
So this is a representation of our neural network with FW, W, the weights.
And what we would really like to have is that the output of our neural network is approximately equal to the value of some target function, capital F.
So note there is no subscript here.
This is the real, the correct function that we want to approximate.
And in order to do this, we have to specify how far off are we, how good or bad is our approximation.
And that's where we introduce a cost function.
So the cost function simply measures how far off are we from the correct result.
And so what we would do is we take the difference between the correct result and the neural network result,
this will still be a vector, so we take, for example, the norm of this vector and square it.
And that's a measure of how far off we are.
But it's a measure only for a particular input, and we are interested in some overall quality, so we will still average the result over all possible samples
that may stand for all possible training samples or more generally for all possible samples that there could be.
And then one challenge is, of course, that you cannot possibly do this average over a billion different examples,
Presenters
Zugänglich über
Offener Zugang
Dauer
00:33:37 Min
Aufnahmedatum
2017-05-22
Hochgeladen am
2017-05-22 20:21:17
Sprache
en-US